High - Dimensional Ising Model Selection Using 1 - Regularized Logistic Regression

نویسندگان

  • PRADEEP RAVIKUMAR
  • MARTIN J. WAINWRIGHT
  • JOHN D. LAFFERTY
  • J. D. LAFFERTY
چکیده

We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on 1-regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an 1-constraint. The method is analyzed under high-dimensional scaling in which both the number of nodes p and maximum neighborhood size d are allowed to grow as a function of the number of observations n. Our main results provide sufficient conditions on the triple (n,p, d) and the model parameters for the method to succeed in consistently estimating the neighborhood of every node in the graph simultaneously. With coherence conditions imposed on the population Fisher information matrix, we prove that consistent neighborhood selection can be obtained for sample sizes n= (d3 logp) with exponentially decaying error. When these same conditions are imposed directly on the sample matrices, we show that a reduced sample size of n= (d2 logp) suffices for the method to estimate neighborhoods consistently. Although this paper focuses on the binary graphical models, we indicate how a generalization of the method of the paper would apply to general discrete Markov random fields.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HIGH - DIMENSIONAL ISING MODEL SELECTION USING l 1 - REGULARIZED LOGISTIC REGRESSION

We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on l1-regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an l1-constraint. The method is analyzed under high-dimensional scaling, in which both the number of nodes p and maximum neighbor...

متن کامل

Fast Bayesian Feature Selection for High Dimensional Linear Regression in Genomics via the Ising Approximation

Feature selection, identifying a subset of variables that are relevant for predicting a response, is an important and challenging component of many methods in statistics and machine learning. Feature selection is especially difficult and computationally intensive when the number of variables approaches or exceeds the number of samples, as is often the case for many genomic datasets. Here, we in...

متن کامل

High-Dimensional Graphical Model Selection Using l1-Regularized Logistic Regression

We consider the problem of estimating the graph structure associated with a discrete Markov random field. We describe a method based on l1-regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an l1-constraint. Our framework applies to the high-dimensional setting, in which both the number of nodes p and maximum ne...

متن کامل

Bayesian feature selection for high-dimensional linear regression via the Ising approximation with applications to genomics

MOTIVATION Feature selection, identifying a subset of variables that are relevant for predicting a response, is an important and challenging component of many methods in statistics and machine learning. Feature selection is especially difficult and computationally intensive when the number of variables approaches or exceeds the number of samples, as is often the case for many genomic datasets. ...

متن کامل

Model selection for linear classifiers using Bayesian error estimation

Regularized linear models are important classification methods for high dimensional problems, where regularized linear classifiers are often preferred due to their ability to avoid overfitting. The degree of freedom of the model is determined by a regularization parameter, which is typically selected using counting based approaches, such as K-fold cross-validation. For large data, this can be v...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010